Retrospective judgments of confidence in a complex span task

Although the study of metamemory monitoring originated in predictions for simple span tasks, the study of metacognition for working memory (WM) has been somewhat neglected in comparison with long-term memory. We aimed to fill this gap by exploring the ability to self-assess WM operations. Thirty-four participants performed 16 series of complex span tasks and rated their confidence in a verbal recall paradigm. We manipulated the cognitive load based on the TBRS model in order to analyze the role of attentional resources on both WM and metacognitive evaluations. As expected, we found that recall is affected by cognitive load and we found standard serial position effects. Interestingly, metacognitive evaluations followed the same pattern, and measures of metacognitive sensitivity suggest that participants are able to make item-by-item retrospective judgments reflective of their performance. We discuss how these results contribute to our understanding of metacognitive access to newly-formed WM contents.

performed on the material held in WM.However, it is essential to elucidate the functional disparities between WM and LTM, as these disparities can manifest in participants' performance and maintenance strategies [23][24][25][26][27] .Indeed, monitoring in the context of a metacognitive task on LTM information and monitoring newly-formed information held in WM must differ from each other.
Whereas WM can be thought of as an essential component in holding in mind information for which we want to make a metacognitive judgment, as implied in theoretical accounts of metacognition, here our emphasis is more directly on the appraisal of the information held in WM in the context of a relatively process-pure WM.Importantly, recent research has directly addressed this question, yielding valuable outcomes.This second group of studies considers the measurement of subjects' evaluations of their WM function.This approach allows for the examination of metacognitive monitoring by determining whether individuals' metacognitive evaluations align appropriately with their objective performance in typical WM tasks.
Metacognitive monitoring is shown where people's metacognitive evaluations are appropriate given their actual performance.For this purpose, in simple span tasks, metacognitive bias and sensitivity have been measured to determine metacognitive function.Bias refers to the difference in magnitude between confidence and performance, in the sense of being under or overconfident.Sensitivity indicates the extent to which metacognitive evaluations reflect the participant's ability to distinguish their correct from incorrect responses in their metacognitive evaluations [28][29][30] .
In one of the earliest studies that explored access to WM, Murphy et al. 31 explored whether participants could predict their performance in a serial recall task inspired by Flavell's pioneering work in metamemory (e.g., 32 ).In their study, participants estimated their span (i.e., the number of items they could successfully recall) for a set of line drawings.They were presented with two items and asked "could you remember this many?".A longer list was presented each time the participant reported that they could remember the list, with the estimated span being the longest list that the participant predicted to be able to remember.After this prediction phase, participants carried out a span task.The primary aim of this task was to compare younger and older adults, in order to see if metamemory function could explain memory dysfunction in the older group, but no differences were found between groups in the accuracy of their predictions.
Metacognitive judgments map onto robust effects observed in simple-span tasks, such as the serial position effect (see [33][34][35][36] for reviews of the serial position effect).It has also been observed that predictions of capacity in simple-span tasks with words vary before and after the encoding phase and depending on the nature of the word to be recalled (i.e., longer, shorter, similar, or dissimilar) 37 .In other words, metacognitive judgments are affected by the word-length and phonological similarity effects showing an analogous pattern to recall (see 38 ).Likewise, predictions of capacity in simple-span tasks with images vary before and after the encoding phase.For instance, Bertrand et al. 39 observed that predictions after the encoding phase showed a lower metacognitive bias than predictions made before the encoding phase.Neuropsychological data show that patients suffering from Alzheimer's Disease seem capable of adjusting their metacognitive evaluations for image span tasks and are as accurate as healthy older adults 40 .These outcomes suggest an updating of metacognitive evaluations following the encoding experience.However, the problem with the use of images (of objects) and (real) words in these studies is that they encourage strategies that broadly involve LTM knowledge.
It is worth noting that these previous studies used either only global judgments or a variation of them.Global judgments are used to study the ability to evaluate the general performance for a task, unlike local judgments which are made on each item and allow a more fine-grained analysis of monitoring 41,42 .The problem is that global predictions are limited and allow the measurement of only metacognitive bias while neglecting sensitivity 43,44 .Furthermore, since global judgments evaluate the whole performance, they tend to target WM capacity rather than an awareness of the current contents of WM.The difference between capacity and content is crucial.WM capacity can be referred as the capacity to maintain a certain amount of information 23,25,45 .In contrast, WM content refers to the representations in WM that are supposed to "be directly accessible for conscious inspection" (p.510, 46 ).Thus, global judgments of simple span tasks might be inadequate to investigate metacognitive access to WM representations themselves.
From a different approach, Reyes and Sackur 47 provide relevant findings highlighting the ability to introspect on contents in WM.In their study, participants performed a probe recognition-based task within a simple span paradigm and estimated the number of items they needed to scan after identifying the probe.The researchers manipulated retrieval instructions, prompting participants to identify recent items relative to a distractor or determine item presence.Notably, both performance and estimations varied similarly depending on the retrieval instructions, with the identification of recent items being strongly influenced by serial position.Crucially, the subjective evaluation of how far back the participant needed to scan WM to identify a target was related to the actual serial position, suggesting access to information in WM for such introspective evaluations.
To investigate metacognitive access to WM representations, local judgments appear to be the most appropriate method.Retrospective judgments of confidence are typically used as item-by-item judgments (i.e., local judgments., 48).Access to metacognitive awareness of the ongoing task is described as second-order performance, and the task itself as first-order performance 49 .Sahar et al. 50used such a procedure; asking participants to memorize images presented one by one at different locations.Participants performed a spatial recognition of a probed item and then rated their confidence in the item's location choice.In a second experiment, participants had to respond first to whether the probed item was displayed, then rate their confidence, and finally perform the location judgment.They also contrasted real objects with distorted objects.Overall, the results showed moderate metacognitive sensitivity to visual WM performance and a solid serial position effect on WM and confidence.
Inspired by this approach, we were motivated to explore the same issues for verbal WM within a serial recall paradigm.Specifically, we aimed to determine whether people have metacognitive access to their WM content within a complex span task.To better investigate metacognitive awareness of ongoing processes, we also manipulated cognitive load (CL).According to the time-based resources sharing (TBRS) model 13 , WM relies on a sole and limited attentional resource, so when attention is occupied by information processing, it is no longer available to maintain information, and the activation of the items to be memorized decreases, which translates into a gradual erasure of the trace [51][52][53] .CL is defined by the proportion of time during which attention is captured by the processing task.Thus, higher CL negatively affects the WM performance calculated as the number of items recalled 54,55 .
In our experiment, for each trial, participants memorized a series of six letters while performing a processing task after each single letter.The processing task was performed on digits for which the participant had to judge either their location on the screen or whether they were odd or even.These two tasks resulted in a low and a high CL condition respectively 56 .We calculated two WM scores (a) item recall and (b) strict recall.Item recall scored as correct the items recalled regardless of their original position, whereas strict recall counted as correct only the items recalled in their original position.Immediately after recall, each participant performed two item-by-item judgments of confidence regarding (a) only the retrieved item and (b) the retrieved item and its position.This allowed us to evaluate metacognitive sensitivity for the two different WM measures.
We expected to reproduce the effects of CL and serial position on WM scores.Most importantly, these effects should be observed in metacognitive evaluations.If people have metacognitive access to the WM contents during these complex span tasks, we should find that the magnitude of confidence follows the same pattern of WM scores, meaning that both recall and confidence should be lower in the higher CL condition and affected by recency and primacy effects.We hypothesized that participants should be able, in general, to accurately monitor their performance on an item-by-item basis using subjective metacognitive evaluations.Moreover, using a measure of metacognitive sensitivity, we should show that there is a reliable relationship between performance and metacognitive evaluation.Our hypotheses and analyses were preregistered and can be found on https:// osf.io/ c8qfg.

Results
Data are available on https:// osf.io/ hsb9e/.We tested thirty-four participants.Each participant performed 16 sixletter trials resulting in 544 trials in total (272 per CL condition).Based on typical TBRS analyses, we discarded the trials that did not achieve the criterion of 80% correct key press responses of the processing task to ensure that participants actually manage to perform the dual task.This procedure resulted in 4.6% of discarded trials, but all participants contributed data to the final analysis.In total, we analyzed 254 trials in high CL condition and 265 trials in the low CL condition.A first comparison between response times of the parity and location judgments provided a manipulation check of cognitive load.A Wilcoxon signed rank test confirmed the parity judgment effectively elicited longer RT with a median of 618 ms (IQR = 600-648) than the location judgment task with a median 445 ms (IQR = 423-488) (z = − 5.012, p ≤ 0.001).
Participants reported confidence for each of six positions in each trial, which makes a total of 1524 and 1590 performance-confidence observations for high and low CL conditions respectively.We computed two WM scores: item recall and strict recall as described above and participants provided metacognitive evaluations corresponding to these scoring methods: retrospective confidence of item recall and retrospective confidence of strict recall.
Data were analyzed by using R (version 4.2.0) 57.For WM scores, we used the lme4 package (version 1.1-30) 58 and the function glmer to perform two logistic regressions through generalized linear mixed models (GLMM) that allow for analyzing binary responses.Mixed effects analyses maximize the generalizability of our results to other participants 59 .Moreover, since CL can vary across participants, mixed models allow us to control the variability that could be accounted for individual differences.We used the method proposed by Bates et al. 60 to construct parsimonious mixed models preventing convergence problems.
For metacognitive evaluations, we also applied two ordinal logistic regressions since we used ordinal confidence scales.Thus, we used Ordinal package (version 2019.12-10) 61,62.Ordinal logistic regression models aimed to evaluate whether or not magnitude of confidence for both strict recall and item recall were affected by CL and serial position.The analysis script is available on https:// osf.io/ pbktn/.Finally, in keeping with other studies in the literature, we performed gamma correlations to calculate metacognitive sensitivity.These correlations compare the participants' confidence and the number of items they recall or fail to recall at each level of confidence 30,63 (for a critique see 29 ).
Figure 1 shows the mean levels of performance and retrospective confidence for each serial position and each condition of CL.The left-hand panel (a) shows the data for item scoring, and the righthand panel shows the data for strict recall scoring (b).In our examination of the effects of serial position, we utilized a comparison of models that included serial position versus models that did not include it (see Judd et al., 2017).The model incorporating serial position as a fixed effect is more robust for both item recall (Chi-square = 32.58,DF = 5, p < 0.001) and strict recall (Chisquare = 62.53,DF = 5, p < 0.001) as compared to models that do not include it.These findings are illustrated in Fig. 1A,B respectively.Our analyzes revealed that, when comparing estimated marginal means, items in first position were better recalled than those in third, fourth, fifth, and sixth position for both item recall and strict recall (all p-values < 0.01).Additionally, items in sixth position were better recalled than those in fourth and fifth www.nature.com/scientificreports/position (all p-values < 0.01) on both item recall and strict recall.Results for serial position on strict and item recall were averaged over the levels of CL and Tukey method for p-value adjustment was applied to compare a family of six estimates.These outcomes indicate the presence of typical primacy and recency effects in WM, as items located in first and last positions were better recalled than those in the middle of the series, regardless of the type of scoring.The comparison model method did not detect global interactions between our factors for either item or strict recall.

Effects of cognitive load and serial position on retrospective confidence
Two ordinal logistic models were employed to investigate the effects of CL and serial position on retrospective confidence of item recall and strict recall.The results indicate a significant impact of CL and serial position on both metacognitive judgments.As shown in Fig. 1A, participants exhibited lower confidence levels in item recall within the parity judgment condition (M = 75.34%,SD = 20.71)than in the location judgment condition (M = 82.80%,SD = 17.49) β = − 0.58, z = − 6.15, p < 0.001.Likewise, Fig. 1B illustrates that participants demonstrated lower confidence levels in strict recall within the parity judgment condition (M = 61.36%,SD = 26.18)compared to the location judgment condition (M = 70.68%,SD = 23.02)β = − 0.63, z = − 7.32, p < 0.001.The pattern of retrospective confidence of both WM measures followed the same pattern as the recall performance.Furthermore, our study found that serial position has a significant effect on retrospective confidence of both item recall and strict recall.The comparison model approach revealed that models including serial position as a fixed effect were more robust than models without it for retrospective confidence of both item recall (likelihood ratio test = 218.88,p < 0.001) and strict recall (likelihood ratio test = 62.57, p < 0.001).Comparisons of estimated marginal means showed that participants exhibited more confidence for items in the first position than for items in other positions for both types of recall (all p < 0.03).Additionally, participants were more confident for items in the sixth position than for items in the fourth and fifth positions (all p < 0.01).No interaction effect was observed in any of our retrospective confidence measures.

Metacognitive bias
On average, participants were under-confident for item recall in both the parity judgment condition (− 3.64% of difference between the confidence and the actual performance) and the location judgment condition (− 2.49% of difference between the confidence and the actual performance).Whereas for strict recall, participants were overconfident in both the parity judgment condition (8.71% of difference between the confidence and the actual performance) and the location judgment condition (5.11% of difference between the confidence and the actual performance).

Metacognitive sensitivity
For each participant, a Gamma correlation coefficient was calculated between recall and retrospective confidence 30 collapsed across all serial positions.We calculated a 2 × 2 gamma, using a median split for each participant to categorize high and low confidence judgments and comparing this with the recall status (correct or incorrect) for each response.The median gamma value for item recall was 0.91 (IQR = 0.83-0.95) and for the strict recall was 0.85 (IQR = 0.78-0.91),suggesting that participants were able to accurately judge their WM performance.In support of these results and given criticism of the gamma correlations, in an analysis that we did not preregister, we performed ordinal regressions to determine if we could predict confidence with recall and CL as factors (see Fig. 2).Here we treat the recall as an independent variable, with the logic that if people make confidence judgments that are reflective of the ability to monitor performance, this should show that recall status has an impact on confidence.Our model confirms the significant effects of recall status and cognitive load on confidence (see Table 1).

Discussion
The main goal of this study was to determine whether participants have metacognitive access to WM during a complex span task.In contrast to previous studies that primarily utilized global judgments or a probe recognition task to examine the metacognition of WM, the current study employed a local judgment approach, whereby we asked for a confidence judgment for each item retrieved, enabling the comparison of primacy and recency effects observed in serial recall.Moreover, we manipulated CL, as this produces a robust and well-documented effect on verbal WM performance.We thus compared the patterns for recall and confidence according to benchmark effects typically observed in verbal WM and targeted the awareness of the contents of WM rather than its capacity.We replicated CL and serial position effects on both WM scores.Participants recalled fewer items in the parity judgment condition (i.e., higher CL) than in the location judgment condition (i.e., lower CL).The parity judgment task requires more attentional resources to be performed; consequently, attentional resources are less available to maintain information in WM 54 .Moreover, items in first position were better recalled than those in other positions, and those in sixth position were better recalled than those in fourth and fifth positions, yielding a clear pattern of primacy and recency 36 .
Importantly, both confidence measures followed a similar pattern as recall.Participants exhibited lower levels of confidence in the higher CL condition compared to the lower CL condition.Furthermore, in alignment with previous studies 47,50 , recency and primacy effects were mirrored in metacognitive evaluations.Participants were more confident for items recalled in the first positions than for items in other positions.Likewise, they were more confident for items recalled in the sixth position than for items in the fourth and fifth positions.
These analogous patterns between retrospective confidence and WM scores suggest an association between metacognitive evaluations and recall.This association was outlined by high levels of metacognitive sensitivity as measured by Gamma, meaning that participants accurately distinguished between correctly and incorrectly www.nature.com/scientificreports/recalled items in their confidence judgments.The logistic regression models also showed that confidence is a reliable predictor for strict recall and item recall, endorsing a strong association between first-and secondorder performance.Altogether, our data show that participants were able to adjust their confidence according to the likelihood of their recall being correct.There are a number of theoretical proposals that explain how metacognitive judgments access memory, mostly deriving from the LTM literature, which we summarize here with reference to our task.One first issue to consider is how our methodology impacts on metacognitive evaluations, and the extent to which our retrospective confidence judgments tap into representations held in WM.In the current experiment, the representations of the letter series may have already disappeared from WM by the time the judgments were made 23 , or at least, they were not necessary to provide a confidence response, since we provided all participants with a reminder of their recall, albeit immediately after their recall attempt.Therefore, one could consider that the metacognitive evaluations were not based on direct online awareness of the WM system, but on the retrieval experience during recall.However, some record of how recall was achieved must be accessible which influences the metacognitive evaluation.
Whether or not representations are still in WM at the moment of making the confidence judgment, our results are consistent with the first-second-order approach of metacognition 49 .Recall is the first-order behavior and retrospective judgments of confidence on WM outputs are the second-order behavior.Retrospective judgments refer to the level of confidence a participant has in being correct on a given first-order response.
As stated by Koriat 64 , local retrospective judgments, such as those employed in our study, are primarily rooted in the experience of the first-order process and are influenced by factors such as fluency and ease of processing or retrieval [41][42][43] .Across different cognitive domains, experience-based metacognition can be facilitated by an architecture linking first-and second-order behaviors in a reliable manner 10 .This framework theorizes the capacity for constructing meta-level representations of task difficulty to infer confidence 48,65 , with the expectation that while any factors affecting first-order performance should be reflected in confidence judgments, metacognitive sensitivity remains preserved 10 .
Our participants were tasked with retrieving information in two different conditions, one more demanding than the other, along with the recallability of certain items influenced by their position.Working memory scores clearly reflected these differences.Retrieval in the low CL condition was probably perceived as easier than in the high CL condition, resulting in higher confidence levels.In the same vein, previous research has shown that CL also affects recall time in WM tasks 66 with shorter retrieval latencies in low compared to high CL, consistent with the view of latency as a measure of trace strength (i.e., the stronger the trace, the faster the response) 64,67,68 .Across domains, retrospective confidence judgments (i.e., second-order responses) are shown to correlate with first-order response times 10,65,69,70 .Future research should aim to measure response latency in each CL condition and serial position to gain a deeper understanding of the cues that underlie metacognition for WM, since it is possible that retrospective confidence in this task, as is shown in the literature, is related to the time taken to make the first order decision.
Metacognitive evaluations can also be based on informational cues 42 .Koriat 64 explains that part of the overconfidence bias in retrospective judgments is due to priori beliefs about one's own skills, the task, or the function to be tested.Interestingly, most of the studies on metacognition for WM have shown that participants tend to be overconfident (e.g., 37,50 ), and as such perhaps have pre-existing beliefs about how easy a task will be.In our study, overconfidence was found only for strict recall taking into account the item as well as its position, but not in the more liberal item recall.This suggests that participants can modulate their metacognitive evaluations in keeping with the task demands, and therefore, perhaps there is not a uniform overconfidence in WM as the literature suggests.However, this difference should be interpreted with caution as no previous studies have examined this specific question.
A question for future research is how participants are aware of the effects of CL and serial position.It remains unclear whether these two factors impact recall, which in turn affects confidence, or if they have a more direct influence on confidence, and our post-recall measure is unable to differentiate these two accounts.To differentiate between these possibilities, one approach may be to gather predictions on CL conditions and serial positions www.nature.com/scientificreports/prior to retrieval, or even before encoding.This would allow us to determine whether individuals use their prior knowledge about WM and expectations to form their predictions, and thus, whether the sensitivity to CL and serial position is driven by informational cues or a direct outcome of recall performance for each CL condition and each serial position.One possibility is that serial position acts as a cue to retrieval to which participants have a direct access as suggested by previous studies.For instance, one may have information about the serial position not captured in our strict and item confidence evaluations.Imagine someone successfully retrieves the items in positions 1, 3, 4, 5, and 6, but knows that they have forgotten the item in position 2. In this case, the participants could be sure that there was missing information and be sure about the appropriate cue (the serial position), and we propose that this is akin to the feeling-of-knowing in LTM 71,72 .It would reflect a form of metacognition that was related to understanding the structure of the series and a certain 'place' in WM, whereby the serial position acted as a cue both for retrieval and metacognitive evaluations.
Another important area of study would involve examining how metacognitive judgments and their accuracy affect performance, as well as their potential relationship to strategy selection.Research has shown that higher levels of metacognition can influence the selection of strategies 73 .Interestingly, metacognitive beliefs and metacognitive experiences appear to have distinct impacts on strategy selection 74,75 .At present, our focus is primarily on monitoring and we set out to investigate metacognitive access to representations held in WM.Our results show that participants have the ability to discriminate between individual items recalled from WM, and confidence judgments reproduce standard CL and serial position effects.Metacognitive evaluations of WM are thus non-random.The extent to which this reflects direct metacognitive access to specific WM cues is a critical question for future research in this domain.
Given the emerging consensus that at least some aspects of post-decisional metacognition are domaingeneral 10,48,76 , we would not perhaps expect anything other than accurate metacognition in WM for retrospective confidence.To explore the effect of CL that impacts primarily the encoding phase, it would be necessary to further explore the tradeoff of attentional resources between maintenance, processing, and monitoring of the ongoing WM operation.As mentioned above, our experiment provides insights into metacognitive access to WM outputs and we have cautiously indicated here that retrieval fluency (as an experiential cue) may explain the patterns here.Examining informational cues would be another important question to address: for instance, do participants have an explicit model of typical serial position effects which informs their confidence judgments?
In conclusion, previous studies have demonstrated that participants are able to update and adjust their metacognitive judgments according to the type of experience during encoding WM tasks and that metacognitive judgments can map the effects of word length, phonological similarity, and serial position 37,39,40,50 .Our results contribute to the growing body of literature 37,39,40,50 suggesting that metacognitive access to WM can occur in a similar manner to that of LTM.Additionally, the use of item-by-item judgments in our study suggests an access to WM content, going beyond investigations of WM capacity.

Method Participants
Thirty-four undergraduate students between 18 and 30 years old (mean age = 20.86)voluntarily took part in this experiment.The non-inclusion criteria were: taking neuroleptics or reporting neuropsychological or psychiatric disorders.We excluded trials where participants obtained less than 80% in the processing task.The participants received course credit for participation.All methods were carried out in accordance with relevant guidelines and regulations.This study was conducted in compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki).All participants were informed about the characteristics of the study through an informed consent document that they signed before participating.The protocol was submitted to the multidisciplinary ethics committee from the Grenoble Alpes University (CERGA), from which we obtained a favorable opinion (CERGA-Avis-2021-9).

Materials
All the tasks were coded on Open Sesame 77 and are available in https:// osf.io/ s879n/.The WM task is a variation of the complex span task used in Barrouillet et al. 56 where participants had to memorize series of 6 letters while conducting a processing task.For the memoranda, all consonants in the alphabet were used except for "W" (which is trisyllabic in French).For the processing task, stimuli were digits from 1 to 10 presented on screen in their Arabic form.Participants performed either a location judgment corresponding to the low CL condition or a parity judgment corresponding to the high CL condition.The metacognitive task encompassed two item-byitem judgments of confidence about item recall and strict recall respectively.Ordinal scales of six levels (i.e., 0, 20, 40, 60, 80, and 100) were provided for these judgments of confidence (see Fig. 2).

Procedure
The experimental phase consisted of 16 series, each composed of 6 to-be-remembered letters interleaved with a processing task.Each series started with a 750 ms fixation signal (an asterisk) centered on the screen that was replaced by the first letter to be remembered.Each letter was displayed on the screen for 1500 ms followed by a delay of 500 ms.After each letter, the 6 digits to be processed (i.e., distractors) appeared sequentially on screen, with each digit appearing either in the lower or in the upper part of the screen.They were displayed for 711 ms followed by a delay of 356 ms.Hence, the time available to process each digit was 1067 ms, resulting in a constant inter-letter interval of 6402 ms.For the high CL condition, participants had to judge the parity of the digits.They had to verbally report "odd" (in French, "impair") and to press "S" simultaneously if the digit was odd, or say "even" ("pair") and press "L" simultaneously if the digit was even.For the low CL condition, the participants judged the stimuli's location by either pressing "S" and saying aloud "down" (in French, "bas") if the stimuli were www.nature.com/scientificreports/located on the lower part of the screen or pressing "L" and saying aloud "up" ("haut") if they were located on the upper part of the screen.At the end of each series, the word Recall appeared at the center of screen to inform the participants that they had to recall the letters in the correct order by using the keyboard.Participants were told to guess if they did not remember a letter.After the recall phase of each series, participants were re-presented their 6 recalled letters all at once, in the order given, and were asked to rate, one by one, their confidence in (a) the item recall by indicating the degree of confidence that the letter actually appeared in the series (in French "à quel point êtes-vous certain que la lettre 'X' est apparue dans cette série ?"), and (b) the strict recall by indicating the degree of confidence that the letter actually appeared in the series and that it had been recalled in the correct position (in French "à quel point êtes-vous certain que la lettre 'X' est apparue dans cette série ET que vous l'avez rappelée à la bonne position ?").
The series were presented in blocks of CL, each block containing four series.This resulted in two blocks of high CL and two blocks of low CL administered in a counterbalanced fashion.Each participant produced 96 retrospective judgments of confidence (8 series × 2 conditions × 6 letters) (see Fig. 3).Lastly, at the end of each block, participants were asked to rate their level of confidence in processing task performance in order to keep them motivated.For instance, at the end of the block of parity judgment, the following question appeared "How successfully do you consider your performance on the PARITY judgment task?[0 = 0% success, 100 = 100% success].Enter a number between 0 and 100" (in French "à quel point considérez-vous avoir réussi la tâche de jugement de PARITE ?[0 = 0% de réussite, 100 = 100% de réussite].Saisissez un chiffre entre 0 et 100").
The experimental phase was preceded by a training phase.First, participants trained themselves in the processing task only (thirty stimuli for the parity and the location judgment tasks respectively).Then, they trained themselves in maintenance tasks with a series of simple spans of 6 letters.Then, the two item-by-item judgments of confidence were performed after the simple spans.Finally, participants trained themselves with the entire task meaning that they performed two trials identical to the experimental phase: memory task, processing task, Figure 3. Illustration of the structure of the task.(A) Complex span task under local judgment condition.This represents the low cognitive load scenario.Participants were required to respond with a "L" key press when the number appeared on the upper side of the screen, simultaneously verbalizing "up"; similarly, a "S" key press was prompted when the number emerged on the lower side, accompanied by the verbalization "down".(B) Complex span task in parity judgment condition.This represents the higher cognitive load scenario.Participants were instructed to press "S" for odd digits while verbalizing "odd", and to press "L" for even digits while verbalizing "even".In both conditions, participants had to recall 6 letters per series and to rate their confidence in the letters' appearance and the positional recall.
In line with our predictions, the binomial logistic model including CL and serial position as factors revealed significant effects on strict recall.Participants recalled fewer items in the parity judgment condition (M = 78.98%,SD = 19.06)than in the location judgment condition (M = 85.29%,SD = 15.85)β = − 0.57, z = − 4.88, p < 0.001.The same pattern occurred with the strict recall score.The participants recalled fewer items in the correct position in the parity judgment condition (M = 52.65%,SD = 26.23)than in the location judgment condition (M = 65.57%,SD = 25.27)β = − 0.67, z = − 5.36, p < 0.001.

Figure 1 .
Figure 1.Effects of cognitive load and serial position on working memory and metacognitive evaluations.(A) Item recall performance and its levels of confidence as a function of serial position and cognitive load.(B) Strict recall performance and its levels of confidence as a function of serial position and cognitive load.Dotted lines represent the recall and solid lines the confidence.Blue indicates the lower cognitive load condition and red the higher cognitive load condition.Error bars represent 95% confidence intervals.

Figure 2 .
Figure 2. Metacognitive sensitivity.(A) Confidence across item recall status and cognitive load conditions.(B) Confidence across strict recall status and cognitive load conditions.This figure shows that confidence is influenced by recall status and cognitive load in both WM measures (item and strict).Error bars represent 95% confidence intervals. https://doi.org/10.1038/s41598-023-45552-y

Table 1 .
Recall and cognitive load as predictors of confidence.All p values were obtained through Wald test by using the ordinal package in R. I.